Regular Expression

Regular expressions are pattern matching utilities found in most of the programming languages. They define a generic pattern to match a sequence of input characters. Regex are widely used in text parsing and search. The Regex class in scala is available in scala.util.matching package.

import scala.util.matching.Regex
object Demo {
def main(args: Array[String]) {
val p = "Functional".r
val st = "Scala is a Functional Programming Language"
println(p findFirstIn st)
}
}

In the above example we are finding the word “functional” . We invoke the r() method which converts string to RichString and invokes the instance of Regex. The findFirstIn method finds the first occurrence of the pattern. To find all the occurrences use finadAllIn() method.

If there is a match, scala returns an object. To return the actual string, we use mkString.

The mkString method concatenates the resulting set. Pipe (|) symbol can be used to specify the OR search condition. For example, small and capital case of the letter ‘S’ in the word ‘Scala’. Instead of using r() constructor the Regex constructor can be used.

Consider an example using regex constructor;

import scala.util.matching.Regex

object multipleoccurence {
   def main(args: Array[String]) {
      val p = new Regex("(S|s)tudent")
      val st = "Student Id is unique. Students are interested in learning new things"
      
      println((p findAllIn st).mkString(","))
   }
}

Above main method will produce output as;

Student,Student

The replaceFirstIn( ) can be used to replace the first occurrence of the matching word and replaceAllIn( ) replaces all the occurrences.

Consider an example below.

object Replace {
   def main(args: Array[String]) {
  	val p = "Car".r
  	val st = "Car has power windows"
 	 
  	println(p replaceFirstIn(st, "Alto"))
   }
}

SubexpressionMatches
^It is used to match starting point of the line.
$It is used to match terminating point of the line.
.It is used to match any one character excluding the newline.
[…]It is used to match any one character within the brackets.
[^…]It is used to match any one character which is not in the brackets.
\\AIt is used to match starting point of the intact string.
\\zIt is used to match terminating point of the intact string.
\\ZIt is used to match end of the whole string excluding the new line, if it exists.
re*It is utilized to match zero or more appearances of the foregoing expressions.
re+It is used to match one or more of the foregoing expressions.
re?It is used to match zero or one appearance of the foregoing expression.
re{ n}It is used to matches precisely n number of appearances of the foregoing expression.
re{ n, }It is used to match n or more appearances of the foregoing expression.
re{ n, m}It is used to match at least n and at most m appearances of the foregoing expression.
q|rIt is utilized to match either q or r.
(re)It is utilized to group the Regular expressions and recollects the text that are matched.
(?: re)It also groups the regular expressions but does not recollects the matched text.
(?> re)It is utilized to match self-reliant pattern in absence of backtracking.
\\wIt is used to match characters of the word.
\\WIt is used to match characters of the non-word.
\\sIt is utilized to match white spaces which are analogous to [\t\n\r\f].
\\SIt is used to match non-white spaces.
\\dIt is used to match the digits i.e, [0-9].
\\DIt is used to match non-digits.
\\GIt is used to match the point where the endmost match overs.
\\nIt is used for back-reference to occupy group number n.
\\bIt is used to match the word frontiers when it is out of the brackets and matches the backspace when it is in the brackets.
\\BIt is used to match non-word frontiers.
\\n, \\t, etc.It is used to match the newlines, tabs, etc.
\\QIt is used to escape (quote) each of the characters till \\E.
\\EIt is used in ends quoting starting with \\Q.
scala> val adder = "we're as similar as two dissimilar things in a pod.\n\t-blackadder"
adder: String =
we're as similar as two dissimilar things in a pod.
        -blackadder

scala> adder.split("\\s+")
res0: Array[String] = Array(we're, as, similar, as, two, dissimilar, things, in, a, pod., -blackadder)

scala> adder.split("""\s+""")
res1: Array[String] = Array(we're, as, similar, as, two, dissimilar, things, in, a, pod., -blackadder)

scala> val name = """(mr|mrs|ms)\. ([a-z][a-z]+) ([a-z][a-z]+)""".r
name: scala.util.matching.Regex = (mr|mrs|ms)\. ([a-z][a-z]+) ([a-z][a-z]+)

scala> val name(title, first, last) = "mr. james stevens"
title: String = mr
first: String = james
last: String = stevens

scala>  val name(title, first, last) = "ms. sally kenton"
title: String = ms
first: String = sally
last: String = kenton

scala> val array(title, first, last) = "mr. james stevens".split(" ")
<console>:27: error: not found: value array
       val array(title, first, last) = "mr. james stevens".split(" ")
           ^

scala> val phone1 = """\((\d{3})\)\s*(\d{3})-(\d{4})""".r
phone1: scala.util.matching.Regex = \((\d{3})\)\s*(\d{3})-(\d{4})

scala> val phone2 = """(\d{3})-(\d{3})-(\d{4})""".r
phone2: scala.util.matching.Regex = (\d{3})-(\d{3})-(\d{4})

scala>  val phone1(area, first3, last4) = "(123) 555-5555"
area: String = 123
first3: String = 555
last4: String = 5555

scala> val phone2(area, first3, last4) = "123-555-5555"
area: String = 123
first3: String = 555
last4: String = 5555

scala>  val namesharemthreegroups = """(m(?:r|rs|s))\. ([a-z][a-z]+) ([a-z][a-z]+)""".r
namesharemthreegroups: scala.util.matching.Regex = (m(?:r|rs|s))\. ([a-z][a-z]+) ([a-z][a-z]+)

scala> val namesharemthreegroups(title, first, last) = "mr. james stevens"
title: String = mr
first: String = james
last: String = stevens

scala> val rhymename = """(mr|mrs|ms)\. ([a-z])([a-z]+) ([a-z])\3""".r
rhymename: scala.util.matching.Regex = (mr|mrs|ms)\. ([a-z])([a-z]+) ([a-z])\3

scala> val rhymename(title, firstinitial, firstrest, lastinitial) = "mr. john bohn"
title: String = mr
firstinitial: String = j
firstrest: String = ohn
lastinitial: String = b

scala> val rhymename2 = """(mr|mrs|ms)\. ([a-z]([a-z]+)) ([a-z]\3)""".r
rhymename2: scala.util.matching.Regex = (mr|mrs|ms)\. ([a-z]([a-z]+)) ([a-z]\3)

scala> val rhymename2(title, first, _, last) = "mr. john bohn"
title: String = mr
first: String = john
last: String = bohn

No comments:

Post a Comment